A comparative study on feature reduction approaches in Hindi and Bengali named entity recognition
نویسندگان
چکیده
Features used for named entity recognition (NER) are often high dimensional in nature. These cause overfitting when training data is not sufficient. Dimensionality reduction leads to performance enhancement in such situations. There are a number of approaches for dimensionality reduction based on feature selection and feature extraction. In this paper we perform a comprehensive and comparative study on different dimensionality reduction approaches applied to the NER task. To compare the performance of the various approaches we consider two Indian languages namely Hindi and Bengali. NER accuracies achieved in these languages are comparatively poor as yet, primarily due to scarcity of annotated corpus. For both the languages dimensionality reduction is found to improve performance of the classifiers. A Comparative study of the effectiveness of several dimensionality reduction techniques is presented in detail in this paper. 2011 Elsevier B.V. All rights reserved.
منابع مشابه
CRF-based Named Entity Recognition @ICON 2013
This paper describes performance of CRF based systems for Named Entity Recognition (NER) in Indian language as a part of ICON 2013 shared task. In this task we have considered a set of language independent features for all the languages. Only for English a language specific feature, i.e. capitalization, has been added. Next the use of gazetteer is explored for Bengali, Hindi and English. The ga...
متن کاملLinguistic Issues in Language Technology – LiLT
This paper describes the development of Named Entity Recognition (NER) systems for two leading Indian languages, namely Bengali and Hindi using the Conditional Random Field (CRF) framework. The system makes use of different types of contextual information along with a variety of features that are helpful in predicting the different named entity (NE) classes. This set of features includes langua...
متن کاملMaximum Entropy Approach for Named Entity Recognition in Bengali and Hindi
This paper reports about the development of a Named Entity Recognition (NER) system in two leading Indian languages, namely Bengali and Hindi using the Maximum Entropy (ME) framework. We have used the annotated corpora, obtained from the IJCNLP-08 NER Shared Task on South and South East Asian Languages (NERSSEAL) and tagged with a fine-grained Named Entity (NE) tagset of twelve tags. An appropr...
متن کاملA Two Stage Language Independent Named Entity Recognition for Indian Languages
This paper describes about the development of a two stage hybrid Named Entity Recognition (NER) system for Indian Languages particularly for Hindi, Oriya, Bengali and Telugu. We have used both statistical Maximum Entropy Model (MaxEnt) and Hidden Markov Model (HMM) in this system. We have used variety of features and contextual information for predicting the various Named Entity (NE) classes. T...
متن کاملNamed Entity Recognition using Support Vector Machine: A Language Independent Approach
Named Entity Recognition (NER) aims to classify each word of a document into predefined target named entity classes and is now-a-days considered to be fundamental for many Natural Language Processing (NLP) tasks such as information retrieval, machine translation, information extraction, question answering systems and others. This paper reports about the development of a NER system for Bengali a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Knowl.-Based Syst.
دوره 27 شماره
صفحات -
تاریخ انتشار 2012